Day 15 - Regular expressions -
Anchors
– Okay, here. Stop. Throw anchor.
The Karate Kid (1984)
This is day 3 of your journey into regular expressions, are you ready for some more magic? Today
we will learn how to use anchors. No, we are not setting boats on fire, what did you understand?
We are going to discover how to tell a regular expression where the text is and how to manage
repetitions.
Let’s start with anchors. You might like to know that the HTML tag <a> commonly used to include
hypertext links, comes from anchor, as it was a way to anchor an element to a specific place in the
markup. Old times, now things are pretty different. Anyway, this has nothing to do with regular
expressions but for the fact that we generally have the need of specifying where some text has to be
found.
Before we dive into positioning, however, I want to discuss text that spans multiple lines. Generally
speaking, the separation of a text into lines is a pure convention. We agree that the characters
\nrepresent a new line, and this convention comes from the C language that wanted to ensure
maximum portability of the code, as the ASCII hexadecimal value 0x0a might not be recognised by
some systems.
Text on a new line is usually considered separated from the previous one, or at least identifying a
different entry in a list. For example, when we make a list of people in a group, we generally put two
different people of two different lines, which avoid any possible misunderstanding between names,
middle names, and surnames.
So, Unix tools generally work line by line, as we already noticed with grep. This is not different
when we use regular expressions, as they are applied by tools that read and process a text line by
line. There are ways to apply a regular expressions to a multiline text, but I leave this for a future
lesson.
Let’s go back to anchors. When we want to find a string in a longer text, the two more frequent
cases we will encounter are when the string is at the beginning of the line and when the string is at
the end of it. The syntax of regular expressions provides two special characters to signal this, aptly
named anchors, which are ^ (caret, or hat) and $ (dollar). Please not that the $ used by some shells
as prompt (and used in the examples of this book) is not related the the $ anchor at all.
I think you remember the first regular expression that we learned